Finding top-k elements in a time-sliding window

نویسندگان

  • Nuno Homem
  • João Paulo Carvalho
چکیده

Identifying the top-k most frequent elements is one of the many problems associated with data streams analysis. It is a well-known and difficult problem, especially if the analysis is to be performed and maintained up to date in near real time. Analyzing data streams in time sliding window model is of particular interest as only the most recent, more relevant events are considered. Approximate answers are usually adequate when dealing with this problem. This paper presents a new and innovative algorithm, the Filtered Space-Saving with Sliding Window Algorithm (FSW) that addresses this problem by introducing in the Filtered Space Saving (FSS) algorithm an approximated time sliding window counter. The algorithm provides the top-k list of elements, their frequency and an error estimate for each frequency value within the sliding window. It provides strong guarantees on the results, depending on the elements real frequencies. Experimental results detail performance on real life cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Probabilistic k-Skyband Operator over Sliding Windows

Given a set of data elements D in a d-dimensional space, a k-skyband query reports the set of elements which are dominated by at most k − 1 other elements in D. k-skyband query is a fundamental query type in data analyzing as it keeps a minimum candidate set for all top-k ranking queries where the ranking functions are monotonic. In this paper, we study the problem of k-skyband over uncertain d...

متن کامل

FDiBC: A Novel Fraud Detection Method in Bank Club based on Sliding Time and Scores Window

One of the recent strategies for increasing the customer’s loyalty in banking industry is the use of customers’ club system. In this system, customers receive scores on the basis of financial and club activities they are performing, and due to the achieved points, they get credits from the bank. In addition, by the advent of new technologies, fraud is growing in banking domain as well. Therefor...

متن کامل

SKYPE: Top-k Spatial-keyword Publish/Subscribe Over Sliding Window

As the prevalence of social media and GPS-enabled devices, a massive amount of geo-textual data has been generated in a stream fashion, leading to a variety of applications such as location-based recommendation and information dissemination. In this paper, we investigate a novel real-time top-k monitoring problem over sliding window of streaming data; that is, we continuously maintain the top-k...

متن کامل

A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is likely to be changed as time goes by. However, most of mining algorithms or frequency approximation algorithms for a data stream do not able to extract the recent change of information in a data stream adaptively. This paper proposes a s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Evolving Systems

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2011